Compton
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Apertus, Project, Hernández-Cano, Alejandro, Hägele, Alexander, Huang, Allen Hao, Romanou, Angelika, Solergibert, Antoni-Joan, Pasztor, Barna, Messmer, Bettina, Garbaya, Dhia, Ďurech, Eduard Frank, Hakimi, Ido, Giraldo, Juan García, Ismayilzada, Mete, Foroutan, Negar, Moalla, Skander, Chen, Tiancheng, Sabolčec, Vinko, Xu, Yixuan, Aerni, Michael, AlKhamissi, Badr, Mariñas, Inés Altemir, Amani, Mohammad Hossein, Ansaripour, Matin, Badanin, Ilia, Benoit, Harold, Boros, Emanuela, Browning, Nicholas, Bösch, Fabian, Böther, Maximilian, Canova, Niklas, Challier, Camille, Charmillot, Clement, Coles, Jonathan, Deriu, Jan, Devos, Arnout, Drescher, Lukas, Dzenhaliou, Daniil, Ehrmann, Maud, Fan, Dongyang, Fan, Simin, Gao, Silin, Gila, Miguel, Grandury, María, Hashemi, Diba, Hoyle, Alexander, Jiang, Jiaming, Klein, Mark, Kucharavy, Andrei, Kucherenko, Anastasiia, Lübeck, Frederike, Machacek, Roman, Manitaras, Theofilos, Marfurt, Andreas, Matoba, Kyle, Matrenok, Simon, Mendonça, Henrique, Mohamed, Fawzi Roberto, Montariol, Syrielle, Mouchel, Luca, Najem-Meyer, Sven, Ni, Jingwei, Oliva, Gennaro, Pagliardini, Matteo, Palme, Elia, Panferov, Andrei, Paoletti, Léo, Passerini, Marco, Pavlov, Ivan, Poiroux, Auguste, Ponkshe, Kaustubh, Ranchin, Nathan, Rando, Javi, Sauser, Mathieu, Saydaliev, Jakhongir, Sayfiddinov, Muhammad Ali, Schneider, Marian, Schuppli, Stefano, Scialanga, Marco, Semenov, Andrei, Shridhar, Kumar, Singhal, Raghav, Sotnikova, Anna, Sternfeld, Alexander, Tarun, Ayush Kumar, Teiletche, Paul, Vamvas, Jannis, Yao, Xiaozhe, Zhao, Hao, Ilic, Alexander, Klimovic, Ana, Krause, Andreas, Gulcehre, Caglar, Rosenthal, David, Ash, Elliott, Tramèr, Florian, VandeVondele, Joost, Veraldi, Livio, Rajman, Martin, Schulthess, Thomas, Hoefler, Torsten, Bosselut, Antoine, Jaggi, Martin, Schlag, Imanol
We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting `robots.txt` exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.
- Europe > Austria > Vienna (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Jordan (0.04)
- (30 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Personal > Interview (0.67)
Democrat moves to block Trump admin from using military drones to monitor protests after LA riots
A House Democrat is moving to block the Trump administration from being able to use military-grade drones to surveil protests in the U.S. Rep. Jimmy Gomez, D-Calif., introduced the bill in response to the Department of Homeland Security (DHS) reportedly using MQ-9 Reaper drones to monitor the protests in Los Angeles earlier this year. "The U.S. government should never use military drones to spy on its own people. Not under anyone," Gomez told Fox News Digital in a statement. "This bill would stop Trump's abuse of power and get these combat drones out of our neighborhoods." An MQ-9 Reaper flies by on a training mission at Creech Air Force Base in Indian Springs, Nevada.
- North America > United States > California > Los Angeles County > Los Angeles (0.34)
- North America > United States > Nevada (0.26)
- North America > United States > California > Los Angeles County > Compton (0.06)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military > Air Force (1.00)
Dozens of anti-ICE rioters arrested in LA as Trump sends in National Guard to quell violence
Fox News' Jonathan Hunt reports the latest on the anti-ICE riots in Los Angeles. Correspondent Rich Edson details Dems' response to Trump deploying the National Guard and'Outnumbered' co-host Kayleigh McEnany weighs in on the escalation. Dozens of protesters have been arrested following a weekend of violence across Los Angeles as tensions hit a boiling point over immigration raids throughout the city. On Sunday, law enforcement officials from multiple agencies arrested 41 protesters as anti-Immigration and Customs Enforcement (ICE) demonstrations spiraled out of control. Of the nearly four-dozen arrests, 21 were made by the Los Angeles Police Department (LAPD), 19 by California Highway Patrol and one by the Los Angeles Sheriff's Department.
- North America > United States > California > Los Angeles County > Los Angeles (1.00)
- North America > United States > California > Los Angeles County > Compton (0.07)
- North America > United States > California > Los Angeles County > Paramount (0.06)
Adaptively evaluating models with task elicitation
Brown, Davis, Balehannina, Prithvi, Jin, Helen, Havaldar, Shreya, Hassani, Hamed, Wong, Eric
Manual curation of evaluation datasets is struggling to keep up with the rapidly expanding capabilities and deployment scenarios of language models. Towards scalable model profiling, we introduce and validate a framework for evaluating LLMs, called Adaptive Evaluations. Adaptive evaluations use scaffolded language models (evaluator agents) to search through a target model's behavior on a domain dataset and create difficult questions (tasks) that can discover and probe the model's failure modes. We find that frontier models lack consistency when adaptively probed with our framework on a diverse suite of datasets and tasks, including but not limited to legal reasoning, forecasting, and online harassment. Generated questions pass human validity checks and often transfer to other models with different capability profiles, demonstrating that adaptive evaluations can also be used to create difficult domain-specific datasets.
- Europe > Sweden (0.04)
- Europe > Netherlands (0.04)
- North America > United States > Virginia (0.04)
- (7 more...)
- Leisure & Entertainment > Sports (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Education (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.92)
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Gur, Izzeddin, Furuta, Hiroki, Huang, Austin, Safdari, Mustafa, Matsuo, Yutaka, Eck, Douglas, Faust, Aleksandra
Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (22 more...)
- Leisure & Entertainment (0.46)
- Information Technology (0.46)
- Banking & Finance > Real Estate (0.33)
- Information Technology > Communications > Web (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
RARR: Researching and Revising What Language Models Say, Using Language Models
Gao, Luyu, Dai, Zhuyun, Pasupat, Panupong, Chen, Anthony, Chaganty, Arun Tejasvi, Fan, Yicheng, Zhao, Vincent Y., Lao, Ni, Lee, Hongrae, Juan, Da-Cheng, Guu, Kelvin
Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.
- Europe > United Kingdom (0.68)
- Asia > Middle East > Jordan (0.04)
- Indian Ocean > Red Sea (0.04)
- (22 more...)
- Media > Television (1.00)
- Media > Film (1.00)
- Government > Regional Government (1.00)
- (3 more...)
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Khot, Tushar, Trivedi, Harsh, Finlayson, Matthew, Fu, Yao, Richardson, Kyle, Clark, Peter, Sabharwal, Ashish
Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks that can be delegated to a library of prompting-based LLMs dedicated to these sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task, further decomposed if necessary, and even easily replaced with more effective prompts, trained models, or symbolic functions if desired. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting using GPT3. On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks. When the complexity comes from the input length, we can recursively decompose the task into the same task but with smaller inputs. We also evaluate our approach on textual multi-step reasoning tasks: on long-context multi-hop QA task, we can more effectively teach the sub-tasks via our separate sub-tasks prompts; and on open-domain multi-hop QA, we can incorporate a symbolic information retrieval within our decomposition framework, leading to improved performance on both tasks. Datasets, Code and Prompts available at https://github.com/allenai/DecomP.
- North America > Cuba > Guantánamo Province > Guantánamo (0.05)
- North America > Mexico (0.05)
- Europe > United Kingdom > Wales (0.05)
- (17 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
Future Of Work--The New HR Frontier: These Tech Startups Are Helping Businesses Adapt To A Remote World
Allan Jones has seen the challenges of running a small business firsthand. When he was 14, his father was sued for wrongful termination by a former employee of his Compton, California mini-market. Without the guidance of a human resources department or the finances to fight the suit, he was forced to hire an attorney and dip into Jones' college savings to pay the fees. This experience stuck with Jones, and in 2016 inspired him to found Bambee, a Los Angeles-based company that pairs HR managers with small and midsize businesses on a monthly basis. "I knew that small businesses did not have HR, and the primary reason was price," says Jones, 34.
- North America > United States > California > Los Angeles County > Los Angeles (0.25)
- North America > United States > California > Los Angeles County > Compton (0.25)